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(^ Abstract 

Summary: MALDIquant is an R package providing a complete and modular analysis 
^ pipeline for quantitative analysis of mass spectrometry data. MALDIquetnt is specif- 

^~i ically designed with application in clinical diagnostics in mind and implements 

■^ sophisticated routines for importing raw data, preprocessing, non-linear peak align- 

ment, and calibration. It also handles technical replicates as well as spectra with 
unequal resolution. 

Availability: MALDIqustnt and its associated R packages 

readBrukerFlexData and readMzXmlData are freely available from the R archive 
Q CRAN (http://crcin.r-project.orgl. The software is distributed under the GNU 

* ^ General Public License (version 3 or later) and is accompanied by example files 

I and data. Additional documentation is available from http : //strimmerlab . org/. 
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1 Introduction 

Mass spectrometry profiling is increasingly becoming an important tool in clinical 



diagnostics, for example to identify biomarkers for cancer (e.g. Fiedler et al. 20091. 
Similarly as with other high-throughput technologies, sophisticated statistical algorithms 
are essential in the analysis of spectrometry data (Morris et al.| 2010||. 



We have developed MALDIquant to provide a complete open source analysis pipeline 
on the R platform (R Development Core Team |2012| comprising all steps from importing 



of raw data, preprocessing (e.g. baseline removal), peak detection, non-linear peak 
alignment to calibration of mass spectra. MALDIquant is written as a standalone package 
using S4 object oriented programming to facilitate further extension. 

MALDIquant was initially developed for clinical proteomics using MALDI (Matrix- 
Assisted Laser Desorption/ Ionization) technology. However, the algorithms imple- 
mented in MALDIquant are generic and may be equally applied to other 2D mass spec- 
trometry data. 

2 Distinctive Features 

In comparison with related R packages for mass spectrometry analysis MALDIquant 
features a number of unique capabilities. In particular, it implements a sophisticated 
non-linear peak alignment algorithm (Wang et al. 2010 [He et al. 2011[l as well as a 



calibration procedure for normalization of peak intensities across spectra that is modeled 
on a related method for sequence count data ([Anders and Huber[ 2010[|. In addition. 



MALDIquant allows to analyze technical replicates and spectra with unequal resolution, 
a crucial feature in clinical mass spectrometry where spectra from multiple sources need 
to be compared. 

3 Details on Algorithms 

An example workflow for mass spectrometry analysis using MALDIquant is depicted in 
Fig. Ill starting with a raw unprocessed MALDI spectrum (A), followed by smoothing, 
baseline correction and peak detection (B), local alignment of peaks across spectra by 
warping (C-E), and merging and visualization (F). In the following we briefly provide 
some background on the respective algorithms. 

3.1 Data import 

MALDIquant is carefully designed to be independent of any specific mass spectrometry 
hardware. Nonetheless, native input of binary data files (as well as complete folder 
hierarchies) from Bruker *flex series instruments and input of the mzXML data format is 
supported via the associated R packages readBrukerFlexData and readMzXmlData . 



'^*^'W_iiLl1j^_ 1. .li 



D 



4180 41 9( 



4200 4210 4220 4230 4240 4181 




7766.42 9290.537 



likiLliLJ..Mi.,]J?.M 



4190 4200 4210 4220 4230 4240 



Figure 1: Example of MALDIquant output: A: raw spectrum; B: variance-stabilized, 
smoothed, baseline-corrected spectrum with detected peaks; C: fitted warping function 
for peak alignment; D: four unaligned peaks; E: four aligned peaks; F: merged spectrum 
with detected and labeled peaks. 



3.2 Data preprocessing 

For preprocessing spectral data MALDIquant offers a complete set of routines for smooth- 
ing, variance stabilization, baseline correction, and peak detection. MALDIquant imple- 
ments several approaches to adjust the baseline, and uses per default the SNIP algorithm 
( [Ryan et aL 1988| l that returns a smooth baseline and leads to positive corrected intensi- 
ties(Fig.[l|B). 



3.3 Peak alignment 

For comparison of peaks across different spectra it is essential to conduct alignment. In 
order to match peaks belonging to the same mass MALDIquant uses a statistical regression- 
based approach combining the algorithms of Wang et al. ( 2010[ | and He et al. ( [2011 1. 
Specifically, first landmark peaks are identified that occur in most spectra. Subsequently, 
a non-linear warping function is computed for each spectrum by fitting a local regression 
to the matched reference peaks (Fig.llt-E). This also allows to merge aligned spectra 
from technical replicates. An example of a merged spectrum with identified and labeled 
peaks is shown in Fig. lit'. 



3.4 Calibration 

Quantitative analysis of multiple spectra, e.g. to detect differentially expressed peaks, 
requires calibration. In order to render peak intensities comparable across spectra a suit- 
able scale factor for each individual spectrum needs to be determined. Experimentally, 
quantification of intensities is performed by reference to spike-in samples. In absence 
of spike-ins MALDIquant offers a way of calibrating relative intensities by adapting an 
algorithm for calibrating next generation sequencing data (Anders and Huber 2010). In 



this procedure first a reference spectrum is created using the median intensity of aligned 



peaks from all spectra. Subsequently, a scale factor is computed for each spectrum by 
employing a robust estimator of the overall ratio of the peak intensities of the uncali- 
brated spectrum versus the reference spectrum. Additionally, calibration based on total 
ion current (TIC) is available. 

3.5 Classification and feature selection 

Finally, the resulting calibrated peak intensity matrix may be exported for further use 
in high-level statistical analysis, for instance classification and feature selection using 
shrinkage discriminant analysis ( Ahdesmaki and Strimmer 2010|. 



4 Conclusion 

MALDIquant is a versatile R package providing a flexible analysis pipeline for MALDI- 
TOF and other mass spectrometry data. It offers a number of distinctive features, in 
particular for alignment by non-linear warping and simultaneous calibration of peak 
intensities. 

An overview of its capabilities is given by running the included demo script 

library ( "MALDIquant " ) 
demo ( "MALDIquant " ) 
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